Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFiClient - rename flush() to clear() #9453

Merged
merged 2 commits into from
Apr 5, 2024

Conversation

JAndrassy
Copy link
Contributor

@JAndrassy JAndrassy commented Apr 3, 2024

this is a repeat of #8871 after networking refactoring.

With a major version release the flush() dilemma could be resolved for WiFiClient & co. as last on this platform.

I renamed the flush which cleared the rx buffer to clear(). but flush is pure virtual in Client so I had to add empty flush()
The position of the function in file is to group Print (write) related and Stream (read) related functions.

If Arduino adds to Stream class a method to clear the input buffer, it will be most probably named clear() because Stream is taken from Processing and there it has clear().

I analyzed the API of 18 Arduino networking libraries. ESP32 is the only one with wrong implementation of flush().
There is a very bad consequence of using rx clearing flush() instead of proper flush(). It means RX data loss.

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Warnings
⚠️

Some issues found for the commit messages in this PR:

  • the commit message "WiFiClient - rename flush() to clear() (breaking)":
    • summary looks empty
    • type/action looks empty

Please fix these commit messages - here are some basic tips:

  • follow Conventional Commits style
  • correct format of commit message should be: <type/action>(<scope/component>): <summary>, for example fix(esp32): Fixed startup timeout issue
  • allowed types are: change,ci,docs,feat,fix,refactor,remove,revert,test
  • sufficiently descriptive message summary should be between 20 to 72 characters and start with upper case letter
  • avoid Jira references in commit messages (unavailable/irrelevant for our customers)

TIP: Install pre-commit hooks and run this check when committing (uses the Conventional Precommit Linter).

👋 Hello JAndrassy, we appreciate your contribution to this project!


Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- Resolve all warnings (⚠️ ) before requesting a review from human reviewers - they will appreciate it.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests.

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
4. If the change is approved and passes the tests it is merged into the default branch.

Generated by 🚫 dangerJS against 5cb167f

Copy link
Collaborator

@lucasssvaz lucasssvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. IMO it just needs a better wording on the documentation. @me-no-dev Do you think we can add this to the breaking changes of 3.0.0 ?

docs/en/migration_guides/2.x_to_3.0.rst Outdated Show resolved Hide resolved
@lucasssvaz lucasssvaz added Area: Peripherals API Relates to peripheral's APIs. Breaking change 🔥 Area: WiFi Issue related to WiFi labels Apr 3, 2024
@me-no-dev
Copy link
Member

@lucasssvaz looks fine to me too.

@lucasssvaz lucasssvaz added the Status: Pending Merge Pull Request is ready to be merged label Apr 4, 2024
@lucasssvaz lucasssvaz changed the title WiFiClient - rename flush() to clear() (breaking) WiFiClient - rename flush() to clear() Apr 4, 2024
@me-no-dev me-no-dev merged commit 9b32541 into espressif:master Apr 5, 2024
48 checks passed
@JAndrassy JAndrassy deleted the network_flush_to_clear branch April 5, 2024 11:42
@TD-er
Copy link
Contributor

TD-er commented May 29, 2024

There should have been a compile-time warning whenever those now empty flush() calls are being called.
Now we have to go through all code to see what has been broken by this change (and yes it does have consequences, this change)

So for any similar changes in future PRs, please make compile time warnings whenever those are being called as it is obviously a legacy mistake when you make a call to a function which is now empty and did do stuff before.
Those warnings can be temporary until the new major release of the SDK is being released, but it would prevent lots of lost hours in searching why stuff is now no longer working like before.

For example in my project I have searched for countless hours to see why some of my boards suddenly refused to respond to new calls via the webserver and no idea what caused this as it was nearly impossible to reproduce.

@JAndrassy
Copy link
Contributor Author

@TD-er
the unread chars were in the way?
you can add the warning in your installation of the platform, until you resolve the occurrences

@TD-er
Copy link
Contributor

TD-er commented May 29, 2024

In my specific use case I have a wrapper for handling chunked transfers when sending replies to HTTP requests.
And this used to work very well, also when handling several quick requests in a row like the browser asking for the CSS while loading the page.
However with updating my code to comply with this change (I changed the code to use clear() instead of flush() for the webserver.client()) it is now working as it used to do.

Without my change, the device would appear to be inaccessible for quite a while to just about any HTTP request when it had to serve multiple requests in a burst.

The obvious symptoms were:

  • CSS not loading
  • JavaScript not loading
  • JSON's were not fetched.

But my main issue with this is that it is a very significant change, which should have had some compile-time warnings about the API changes.
So for other similar changes, please consider adding such warnings as this really has taken quite a lot of time for me to search for the cause as I really had not thought of this change.

@JAndrassy
Copy link
Contributor Author

in this case it was not possible to add a warning as the flush() function is a valid function (pure virtual in the Client class) and it is not possible to know when it is used wrong

@TD-er
Copy link
Contributor

TD-er commented Aug 24, 2024

@JAndrassy
Just to let you know the utter frustration I have from this PR being merged as it keeps biting me over and over again.

in this case it was not possible to add a warning as the flush() function is a valid function (pure virtual in the Client class) and it is not possible to know when it is used wrong

This is exactly why it is a real pain in the *ss to have this PR being merged.

You simply don't know when it is being used wrong, not at compile time, not at runtime. You only keep seeing those symptoms.

In the platform packages provided by Jason for me to be able to implement support for ESP-IDF5.x this PR was reverted as it is nearly impossible to find all occurences where this is used incorrectly.
However when using 'vanilla' Arduino code I keep running into these really really annoying issues caused by this PR where you run into vague and hard to reproduce timeouts, failed connections, hanging device as clients used up all available connections, stalls, etc.

I do understand that maybe in the past there may have been a poor choice in function naming. But the way this is now 'fixed' is NOT how it should have done as it is simply breaking stuff and you can't tell where it is all broken unless spending countless hours fixing stuff and with any new lib that may be relying on the older implementation you can do it all over again.
And worst of all as I need to keep multiple versions of the SDK compilable it really messes up the code which should not even have to worry about this at all.

All those hours needlessly spent on these kind of issues cannot be spent to make sure I can get rid of those older SDK requirements or even implement nice new features.

@TD-er
Copy link
Contributor

TD-er commented Aug 24, 2024

IMHO the best way forward would be to generally revert this PR and if there really is a need to make similar changes, there should be found a way to make it work so devs can see at compile-time what needs to be changed.

@JAndrassy
Copy link
Contributor Author

why is it hard to find occurrences of "flush()" in the source code?

@TD-er
Copy link
Contributor

TD-er commented Aug 24, 2024

You just can't replace flush() with clear() in the code.
If it was as simple like that, then there wouldn't be an empty flush() function in the code, right?
There is apparently a need for both functions in some base class.

As I also need to have my code to work with SDK versions not using this PR.
So this would then result in some global define where based on the platform code you define it either as flush() or clear().
This can also be required in library code, which must then be adapted too and made sure to keep those changes when updating such a library.

So this is only enforcing the point I tried to make before as why this PR is causing a mess for developers.
You need to keep track of stuff where you should not need to worry about it.

@TD-er
Copy link
Contributor

TD-er commented Aug 24, 2024

While you're at it, can you also take a look at these:

_udp_ota.flush(); // always flush, even zero length packets must be flushed.

That's exactly in the area where multiple reports have been filed lately, so I guess it might be related.

@JAndrassy
Copy link
Contributor Author

JAndrassy commented Aug 24, 2024

in Parsing.cpp the client.flush() is at the end of the processing of the HTTP request. the response is already sent. If the connection is not kept alive for the next request, then it is closed so why clearing it. If the connection is kept alive for the following HTTP request, then clear() could delete beginning of the following request. But characters remaining in the buffer would make the following request invalid too. So for a reused connection the processing of the request must ensure to read until the exact end of the request.

In ArduinoOTA the UDP object is used to receive and send messages. The function _onRx (invoked before that flush()) reads a message and then sends a response. So it is unclear what the flush() was meant to do. If espota.py uploads work then everything is OK. But it would fail if _onRx doesn't read the whole incoming message because then all following parsePacket will always return 0 (and this is not a good behavior of the NetworkUDP class).

@TD-er
Copy link
Contributor

TD-er commented Aug 24, 2024

One of the git blame of these 2 was about a commit of 6+ years ago where the last commit text mentioned something about it being added intentionally. (some commit inbetween about code formatting)
I'm on an extremely slow laptop right now, so I can't get these git blames to load here.

Regardless if it might be some code smell of a problem elsewhere, it does seem like those 2 had to be changed too when this PR was merged to keep at least the functionality the same.
Do you agree those two occurences should be changed to .clear() too?
If so, then I will make a PR (or you can do one) later this evening.

@Jason2866
Copy link
Collaborator

Jason2866 commented Aug 25, 2024

Still the best solution would to revert this PR completely. The regressions caused by this PR are hughe. Even two follow up PRs are already needed to fix the core itself! How many will be needed at end just for the core itself?
The situation is worse with the many many libs out there. The are all incompatible now.
Please @me-no-dev revert this PR!!

@me-no-dev
Copy link
Member

How about instead of reverting it, we have flush() to call clear()? Would that help you all?

@TD-er
Copy link
Contributor

TD-er commented Aug 26, 2024

I think that's a nice compromise.
Maybe also add some deprecated attribute if possible, so whenever it is used the compiler will warn?

me-no-dev added a commit that referenced this pull request Aug 26, 2024
This is a compromise for issues caused by #9453
@me-no-dev
Copy link
Member

PR with clear and deprecation created: #10242

@Jason2866
Copy link
Collaborator

@me-no-dev Good solution, no silent breaking change anymore. Now there is an info that a mandatory change has be done and still all code is working as expected.

me-no-dev added a commit that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Peripherals API Relates to peripheral's APIs. Area: WiFi Issue related to WiFi Breaking change 🔥 Status: Pending Merge Pull Request is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants